Goto

Collaborating Authors

 empirical result


A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment for Imbalanced Learning

Neural Information Processing Systems

Real-world datasets are typically imbalanced in the sense that only a few classes have numerous samples, while many classes are associated with only a few samples. As a result, a naive ERM learning process will be biased towards the majority classes, making it difficult to generalize to the minority classes. To address this issue, one simple but effective approach is to modify the loss function to emphasize the learning on minority classes, such as re-weighting the losses or adjusting the logits via class-dependent terms. However, existing generalization analysis of such losses is still coarse-grained and fragmented, failing to explain some empirical results. To bridge this gap between theory and practice, we propose a novel technique named data-dependent contraction to capture how these modified losses handle different classes. On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment in a unified manner. Furthermore, a principled learning algorithm is developed based on the theoretical insights. Finally, the empirical results on benchmark datasets not only validate the theoretical results but also demonstrate the effectiveness of the proposed method.


Understanding the Evolution of Linear Regions in Deep Reinforcement Learning

Neural Information Processing Systems

Policies produced by deep reinforcement learning are typically characterised by their learning curves, but they remain poorly understood in many other respects. ReLU-based policies result in a partitioning of the input space into piecewise linear regions. We seek to understand how observed region counts and their densities evolve during deep reinforcement learning using empirical results that span a range of continuous control tasks and policy network dimensions. Intuitively, we may expect that during training, the region density increases in the areas that are frequently visited by the policy, thereby affording fine-grained control. We use recent theoretical and empirical results for the linear regions induced by neural networks in supervised learning settings for grounding and comparison of our results. Empirically, we find that the region density increases only moderately throughout training, as measured along fixed trajectories coming from the final policy. However, the trajectories themselves also increase in length during training, and thus the region densities decrease as seen from the perspective of the current trajectory. Our findings suggest that the complexity of deep reinforcement learning policies does not principally emerge from a significant growth in the complexity of functions observed on-and-around trajectories of the policy.





Many thanks to the reviewers for their deep, thoughtful reviews and constructive suggestions

Neural Information Processing Systems

We note that despite very recent observations on empirical superiority of adaptive synchronization (e.g., Surely, it would be interesting to see if our bound can be tightened. R1. log T communication rounds clarification: However, for local SGD with periodic averaging the proof techniques are more involved. We do not tune the learning rate.



Appendix A Related Work of AUC Optimization

Neural Information Processing Systems

To this end, how to optimize the AUC performance has raised wide attention. As shown in Fig.3(a), given an open-set sample ( x AUC suffers from the inconsistency property III. To be specific, according to the prediction process described in Sec.2, if we select As shown in Fig.3(c), we have OpenAUC(h,r) OpenAUC( h, r) = 1 N C.5 Proof for Proposition 8 Proposition 8. Optimizing OpenAUC is equivalent to the following risk minimization problem: min Meanwhile, the hyperparameter λ is searched in { 0.1, 0 .2 During the test phase, open-set samples are available. In this section, we present the empirical results on fine-grained datasets.


d33174c464c877fb03e77efdab4ae804-AuthorFeedback.pdf

Neural Information Processing Systems

Our work "establishes interpretations of SGD and Adam-family optimizers from a Bayesian filtering perspective" (R3). It is "the first to demonstrate how viewing optimization as Bayesian inference requires modeling temporal dynamics" Adam W" (R4), and therefore explains the excellent performance of these SOT A methods. In the ideal case you shouldn't use a factorised model, and 77-81 aren't trying to motivate a factorised model. Also, see "Conclusions" above for non-factorised future Khan et al. 2018), but we agree that its improvement is an important avenue for future research. Minor 1. Agreed, but a few people get very confused on this point.


ba95d78a7c942571185308775a97a3a0-AuthorFeedback.pdf

Neural Information Processing Systems

We would like to thank the reviewers for their constructive comments. Below, we try to respond to their main comments. Note that each motif exhibits some distinct properties and can be considered as a graph-feature. With this in mind, we are not sure if it is worth adding this baseline to the paper. We will make this clear in the revised manuscript.